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To  understand  the  exchange  of  data  between  systems,  we  may  first  consider  conceptual  models 
for  the  exchange  of  data.  The  first  model  relies  on  a  central  data  structure  for  passing  data  among 
nodes.  This  is  the  model  commonly  used  in  meteorology  and  oceanography  communities.  A 
second  model  is  more  formal,  and  relies  on  instances  of  a  common  data  model.  Nodes  exchange 
data  with  an  instance  of  a  common  database,  with  data  replicated  between  the  common  instance 
databases.  The  third  conceptual  model  deals  with  wrapper  software  that  encapsulates  the  data 
asset.  Applications  query  the  data  asset  using  an  intermediate  layer,  sometimes  called  an 
integrator  or  mediator,  to  identify  the  required  data  asset.  The  mediator  then  deals  with  the 
critical  data  issues  like  consolidation  of  parameter  codes,  units,  replicate  data,  metadata  content 
and  multiple  structures.  The  resulting  data  is  provided  to  the  user  as  a  coherent  and  internally 
consistent  data  set. 

All  of  these  models  support  data  sharing  between  nodes.  The  ICES/IOC1  Study  Group  on  the 
Development  of  Marine  Data  Exchange  Systems  Using  XML  (SGXML)  examined  numerous 
issues  that  are  important  for  the  sharing  of  data  [1].  In  particular,  SGXML  examined  issues 
related  to  metadata,  parameter  dictionaries  and  data  placement  in  XML  structures. 

In  terms  of  metadata,  the  SGXML  reviewed  numerous  international  metadata  standards  for  use 
with  oceanographic  data.  The  SGXML  contributed  to  the  mapping  between  standards  by 
developing  mappings  between  the  Marine  Environmental  Data  Information  (MEDI)  referral 
catalogue  system,  ISO  19115  and  the  European  Directory  of  Marine  Environmental  Data 
(EDMED).  These  mappings  are  important  to  allow  systems  the  ability  to  convert  metadata 
records  from  one  standard  to  another.  This  will  be  very  important  when  combining  data  assets, 
each  using  a  different  metadata  standard,  or  when  conversion  is  required  for  utilization. 

The  SGXML  also  investigated  the  issue  of  parameter  dictionaries.  SGXML  contributed  to  the 
development  of  the  BODC2 3  Parameter  Dictionary.  This  is  evident  by  the  BODC  dictionary 
population  increase  from  7982  entries  in  May  2002  to  14431  entries  in  May  2004.  SGXML  is 
also  responsible  for  an  in  depth  mapping  between  BODC  and  IFREMER’  dictionaries  and  BODC 
and  the  DONAR/WADI  (The  Netherlands)  data  models.  Perhaps  more  importantly,  these 
mappings  have  continued  in  other  projects  and  now  encompass  about  1 1  dictionaries  in  total. 

The  SGXML  also  made  a  contribution  in  the  area  of  XML  data  structures.  One  effort  resulted  in 
the  development  of  the  Keeley  Bricks  [2].  The  initial  concept  for  the  generic  structures  was 
based  on  the  work  of  J.  Robert  Keeley  (Marine  Environmental  Data  Service,  MEDS)  in  the 
1980s.  The  initial  idea  recognized  that  many  data  types  being  delivered  to  the  data  centre 
contained  information  parts  that  were  consistent  across  the  data  types.  It  was  thought  that  these 


1  ICES  -  International  Council  for  the  Exploration  of  the  Sea 
IOC  -  Intergovernmental  Oceanographic  Commission 

"  BODC  -  British  Oceanographic  Data  Centre 
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consistent  parts  could  be  formalized  into  structures,  or  Bricks.  The  formal  Bricks  could  then  be 
arranged  in  multiple  ways  to  address  the  many  structures  present  in  the  various  ocean  data  types. 


This  effort  resulted  in  the  identification  of  20  Bricks.  The  Bricks  cover  aspects  of  oceanographic 
data  types  such  as  analysis  methods,  calibration,  instrumentation,  provenance,  unit  and  variable 
definition.  A  single  data  structure  was  then  developed  from  the  bricks  and  was  found  to  be 
capable  of  storing  a  diverse  set  of  oceanographic  data  types  including:  profile  data,  current  meter 
data,  underway  temperature-salinity  data,  water  sample  data,  acoustic  doppler  current  profiling 
data  (both  moored  and  shipboard)  and  biological  net  tow  data. 

A  second  data  investigation  utilized  some  of  the  ideas  and  methods  discussed  during  the  SGXML 
meetings,  applying  these  ideas  to  the  Tokyo  Bay  Environmental  Information  Center  Project.  An 
XML  structure  and  supporting  software  was  developed  and  used  for  data  collection  efforts  that 
supported  the  monitoring  of  Tokyo  Bay.  This  work  also  utilized  components  of  the  Geography 
Markup  Language  (GML). 

Another  GML  related  effort  attempted  to  incorporate  all  of  the  Keeley  Brick  information  into 
GML.  This  resulted  in  a  somewhat  complicated  set  of  relationships  between  the  Brick  content 
and  the  GML  structure.  GML  implementation  requires  an  abstraction  of  oceanographic  data 
types,  and  thus  potentially  introduces  complications  in  terminology. 

There  are  also  efforts  underway  to  integrate  data  systems  within  the  oceanographic  community. 
The  JCOMM4  Expert  Team  on  Data  Management  Practices  (ETDMP)  is  exploring  issues  related 
to  the  identification  and  aggregation  of  data  sets  [3].  A  funded  ETDMP  project  is  developing  a 
system  based  on  the  conceptual  wrapper  model.  The  system  has  multiple  layers  of  data 
providers,  integrators  and  user  applications.  Users  define  their  requirements  at  the  user 
application  layer.  The  integrator  layer  then  directs  the  queries  to  appropriate  data  providers.  The 
data  providers  retrieve  data  from  the  local  system,  then  sending  the  data  back  to  the  integrator 
layer.  The  integrator  layer  will  deal  with  the  issues  of  parameter  codes,  data  replication,  etc.,  and 
provide  the  user  with  a  single  data  set  from  the  multiple  sources. 

In  terms  of  data  semantics  related  to  parameter  usage  vocabularies,  the  Marine  Metadata 
Interoperability  (MMI)  project  is  making  an  important  contribution  to  identifying  the 
relationships  between  parameters  in  different  dictionaries  [4].  These  dictionaries,  which  actually 
represent  managed  vocabularies,  are  being  aligned  and  mapped  into  the  Web  Ontology  Language 
(OWL)  by  the  MMI  project.  The  OWL  implementation  allows  the  searching  and  discovery  of 
terms  by  examining  up  and  down  the  hierarchy  formed  by  the  implementation.  By  doing  so,  the 
user  has  the  ability  to  find  previously  unknown  terminology  in  other  dictionaries  that  match  the 
search  term.  As  well,  tools  being  developed  under  MMI  allow  users  to  create  and  manage  groups 
of  terms  for  their  particular  needs.  Thus,  users  may  define  groups  of  similar  terms,  from  multiple 
dictionaries,  that  have  particular  meaning  to  the  user. 

In  the  data  exchange  process,  there  are  many  important  issues.  Some  of  the  international  efforts 
addressing  particular  exchange  issues  are  described  in  this  summary  paper.  In  all  of  these  efforts, 
the  critical  underlying  issue  is  an  understanding  of  the  data  content  (Ligure  1).  The  difficulty  in 
understanding  the  content  is  often  related  to  the  supporting  metadata.  Often,  the  supporting 
metadata  descriptions  are  incomplete  or  use  varied  semantic  descriptions  and  different 
vocabularies.  The  assets  are  also  highly  distributed  and  stored  in  many  different  data  structures 


4  JCOMM  -  Joint  WMO/IOC  Commission  on  Oceanography  and  Marine  Meteorology 
WMO  -  World  Meteorological  Organization 


and  software  formats.  All  of  these  factors  can  contribute  to  the  loss  or  misinterpretation  of  the 
data  content.  Only  when  data  exchange  is  seamless  from  a  semantic  perspective,  will  the 
exchange  problem  truly  be  solved. 
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Figure  1:  Schematic  showing  the  difficulties  associated  the  discovery  process. 

Image  adapted  from  "HOW:  Hydrologic  Ontology >  for  the  Web". 
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